Analysing Java identifier names

نویسنده

  • Simon Butler
چکیده

Identifier names are the principal means of recording and communicating ideas in source code and are a significant source of information for software developers and maintainers, and the tools that support their work. This research aims to increase understanding of identifier name content types — words, abbreviations, etc. — and phrasal structures — noun phrases, verb phrases, etc. — by improving techniques for the analysis of identifier names. The techniques and knowledge acquired can be applied to improve program comprehension tools that support internal code quality, concept location, traceability and model extraction. Previous detailed investigations of identifier names have focused on method names, and the content and structure of Java class and reference (field, parameter, and variable) names are less well understood. I developed improved algorithms to tokenise names, and trained part-of-speech tagger models on identifier names to support the analysis of class and reference names in a corpus of 60 open source Java projects. I confirm that developers structure the majority of names according to identifier naming conventions, and use phrasal structures reported in the literature. I also show that developers use a wider variety of content types and phrasal structures than previously understood. Unusually structured class names are largely project-specific naming conventions, but could indicate design issues. Analysis of phrasal reference names showed that developers most often use the phrasal structures described in the literature and used to support the extraction of information from names, but also choose unexpected phrasal structures, and complex, multi-phrasal, names. Using Nominal — software I created to evaluate adherence to naming conventions — I found developers tend to follow naming conventions, but that adherence to published conventions varies between projects because developers also establish new conventions for the use of typography, content types and phrasal structure to support their work: particularly to distinguish the roles of Java field names.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Tokenisation of Identifier Names

Identifier names are the main vehicle for semantic information during program comprehension. Identifier names are tokenised into their semantic constituents by tools supporting program comprehension tasks, including concept location and requirements traceability. We present an approach to the automated tokenisation of identifier names that improves on existing techniques in two ways. First, it ...

متن کامل

Managing Class Names in Java Component Systems with Dynamic Update

This paper deals with class and interface name clashes in Java component systems that occur because of evolutionary changes during the lifecycle of a component application. We show that the standard facilities of the Java type system do not provide a satisfactory way to deal with the name clashes, and present a solution based on administering the names of classes and interfaces with a version i...

متن کامل

Advanced obfuscation techniques for Java bytecode

There exist several obfuscation tools for preventing Java bytecode from being decompiled. Most of these tools simply scramble the names of the identifiers stored in a bytecode by substituting the identifiers with meaningless names. However, the scrambling technique cannot deter a determined cracker very long. We propose several advanced obfuscation techniques that make Java bytecode impossible ...

متن کامل

Introducing Dynamic Name Resolution Mechanism for Obfuscating System-defined Names in Programs

Name obfuscation is a software protection technique, which renames identifiers in a given program, to protect the program from illegal cracking. The conventional methods replace names appearing in the declaration part with the meaningless ones. Therefore, the methods cannot be used to obfuscate names declared in system libraries, since changing such system-defined names significantly deteriorat...

متن کامل

The significance of user-defined identifiers in Java source code authorship identification

When writing source code, programmers have varying levels of freedom when it comes to the creation and use of identifiers. Do they habitually use the same identifiers, names that are different to those used by others? Is it then possible to tell who the author of a piece of code is by examining these identifiers? If so, can we use the presence or absence of identifiers to assist in correctly cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016